AITopics | exit time analysis

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Neural Information Processing SystemsDec-25-2025, 20:40:31 GMT

Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using $\alpha$-stable distributions, a family of heavy-tailed distributions that appear in the generalized central limit theorem. In this context, SGD can be viewed as a discretization of a stochastic differential equation (SDE) driven by a L\'{e}vy motion, and the metastability results for this SDE can then be used for illuminating the behavior of SGD, especially in terms of `preferring wide minima'. While this approach brings a new perspective for analyzing SGD, it is limited in the sense that, due to the time discretization, SGD might admit a significantly different behavior than its continuous-time limit. Intuitively, the behaviors of these two systems are expected to be similar to each other only when the discretization step is sufficiently small; however, to the best of our knowledge, there is no theoretical understanding on how small the step-size should be chosen in order to guarantee that the discretized system inherits the properties of the continuous-time system. In this study, we provide formal theoretical analysis where we derive explicit conditions for the step-size such that the metastability behavior of the discrete-time system is similar to its continuous-time limit. We show that the behaviors of the two systems are indeed similar for small step-sizes and we identify how the error depends on the algorithm and problem parameters. We illustrate our results with simulations on a synthetic model and neural networks.

exit time analysis, heavy-tailed gradient noise, stochastic gradient descent, (5 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.58)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.58)

Add feedback

Reviews: First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Neural Information Processing SystemsJan-26-2025, 08:24:01 GMT

For Reviewer #1's concern about making theory, I tend to be open-minded since I can not find solid evidence that the paper is making theory only. For Reviewer #4's comment about the over-claim of the result the paper proved, my take is follows. First, for many problems, the true local minima enjoys the flat basin. A famous example I have is the following paper: McGoff, Kevin A., et al. "The Local Edge Machine: inference of dynamic models of gene regulation." Second, the authors have explained the motivation of using the Levy process to model the noise.

exit time analysis, heavy-tailed gradient noise, stochastic gradient descent, (6 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Reviews: First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Neural Information Processing SystemsJan-26-2025, 08:23:51 GMT

The reviewers liked the paper and appreciated the authors feedback. The authors should implement all the recommendations from the reviewers in the final version of the paper.

exit time analysis, heavy-tailed gradient noise, stochastic gradient descent, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Neural Information Processing SystemsOct-10-2024, 17:07:04 GMT

Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using \alpha -stable distributions, a family of heavy-tailed distributions that appear in the generalized central limit theorem. In this context, SGD can be viewed as a discretization of a stochastic differential equation (SDE) driven by a L\'{e}vy motion, and the metastability results for this SDE can then be used for illuminating the behavior of SGD, especially in terms of preferring wide minima'. While this approach brings a new perspective for analyzing SGD, it is limited in the sense that, due to the time discretization, SGD might admit a significantly different behavior than its continuous-time limit.

exit time analysis, heavy-tailed gradient noise, stochastic gradient descent, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points

Dixit, Rishabh, Gurbuzbalaban, Mert, Bajwa, Waheed U.

arXiv.org Artificial IntelligenceOct-6-2023

This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the 'flat' geometry around saddle points, first-order methods can struggle to escape these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that gradient-related first-order methods escape strict-saddle neighborhoods, existing analytic techniques do not explicitly leverage the local geometry around saddle points in order to control behavior of gradient trajectories. It is in this context that this paper puts forth a rigorous geometric analysis of the gradient-descent method around strict-saddle neighborhoods using matrix perturbation theory. In doing so, it provides a key result that can be used to generate an approximate gradient trajectory for any given initial conditions. In addition, the analysis leads to a linear exit-time solution for gradient-descent method under certain necessary initial conditions, which explicitly bring out the dependence on problem dimension, conditioning of the saddle neighborhood, and more, for a class of strict-saddle functions.

exit time, saddle point, trajectory, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1093/imaiai/iaac025

2006.01106

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts (0.04)
(5 more...)

Genre:

Research Report (0.63)
Workflow (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (0.45)
Government > Military (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Nguyen, Thanh Huy, Simsekli, Umut, Gurbuzbalaban, Mert, RICHARD, Gaël

Neural Information Processing SystemsMar-18-2020, 20:30:32 GMT

Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using $\alpha$-stable distributions, a family of heavy-tailed distributions that appear in the generalized central limit theorem. In this context, SGD can be viewed as a discretization of a stochastic differential equation (SDE) driven by a L\'{e}vy motion, and the metastability results for this SDE can then be used for illuminating the behavior of SGD, especially in terms of preferring wide minima'. While this approach brings a new perspective for analyzing SGD, it is limited in the sense that, due to the time discretization, SGD might admit a significantly different behavior than its continuous-time limit.

exit time analysis, heavy-tailed gradient noise, stochastic gradient descent, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Filters

Collaborating Authors

exit time analysis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Reviews: First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Reviews: First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise